1,114 research outputs found
Understanding Convolutional Neural Networks in Terms of Category-Level Attributes
Abstract. It has been recently reported that convolutional neural net-works (CNNs) show good performances in many image recognition tasks. They significantly outperform the previous approaches that are not based on neural networks particularly for object category recognition. These performances are arguably owing to their ability of discovering better image features for recognition tasks through learning, resulting in the acquisition of better internal representations of the inputs. However, in spite of the good performances, it remains an open question why CNNs work so well and/or how they can learn such good representations. In this study, we conjecture that the learned representation can be interpreted as category-level attributes that have good properties. We conducted sev-eral experiments by using the dataset AwA (Animals with Attributes) and a CNN trained for ILSVRC-2012 in a fully supervised setting to ex-amine this conjecture. We report that there exist units in the CNN that can predict some of the 85 semantic attributes fairly accurately, along with a detailed observation that this is true only for visual attributes and not for non-visual ones. It is more natural to think that the CNN may discover not only semantic attributes but non-semantic ones (or ones that are difficult to represent as a word). To explore this possibility, we perform zero-shot learning by regarding the activation pattern of upper layers as attributes describing the categories. The result shows that it outperforms the state-of-the-art with a significant margin.
Deep Neural Networks - A Brief History
Introduction to deep neural networks and their history.Comment: 14 pages, 14 figure
Fast Spectral Clustering Using Autoencoders and Landmarks
In this paper, we introduce an algorithm for performing spectral clustering
efficiently. Spectral clustering is a powerful clustering algorithm that
suffers from high computational complexity, due to eigen decomposition. In this
work, we first build the adjacency matrix of the corresponding graph of the
dataset. To build this matrix, we only consider a limited number of points,
called landmarks, and compute the similarity of all data points with the
landmarks. Then, we present a definition of the Laplacian matrix of the graph
that enable us to perform eigen decomposition efficiently, using a deep
autoencoder. The overall complexity of the algorithm for eigen decomposition is
, where is the number of data points and is the number of
landmarks. At last, we evaluate the performance of the algorithm in different
experiments.Comment: 8 Pages- Accepted in 14th International Conference on Image Analysis
and Recognitio
ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids
We introduce an unsupervised feature learning approach that embeds 3D shape
information into a single-view image representation. The main idea is a
self-supervised training objective that, given only a single 2D image, requires
all unseen views of the object to be predictable from learned features. We
implement this idea as an encoder-decoder convolutional neural network. The
network maps an input image of an unknown category and unknown viewpoint to a
latent space, from which a deconvolutional decoder can best "lift" the image to
its complete viewgrid showing the object from all viewing angles. Our
class-agnostic training procedure encourages the representation to capture
fundamental shape primitives and semantic regularities in a data-driven
manner---without manual semantic labels. Our results on two widely-used shape
datasets show 1) our approach successfully learns to perform "mental rotation"
even for objects unseen during training, and 2) the learned latent space is a
powerful representation for object recognition, outperforming several existing
unsupervised feature learning methods.Comment: To appear at ECCV 201
Learning Temporal Transformations From Time-Lapse Videos
Based on life-long observations of physical, chemical, and biologic phenomena
in the natural world, humans can often easily picture in their minds what an
object will look like in the future. But, what about computers? In this paper,
we learn computational models of object transformations from time-lapse videos.
In particular, we explore the use of generative models to create depictions of
objects at future times. These models explore several different prediction
tasks: generating a future state given a single depiction of an object,
generating a future state given two depictions of an object at different times,
and generating future states recursively in a recurrent framework. We provide
both qualitative and quantitative evaluations of the generated results, and
also conduct a human evaluation to compare variations of our models.Comment: ECCV201
Sampled Weighted Min-Hashing for Large-Scale Topic Mining
We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to
automatically mine topics from large-scale corpora. SWMH generates multiple
random partitions of the corpus vocabulary based on term co-occurrence and
agglomerates highly overlapping inter-partition cells to produce the mined
topics. While other approaches define a topic as a probabilistic distribution
over a vocabulary, SWMH topics are ordered subsets of such vocabulary.
Interestingly, the topics mined by SWMH underlie themes from the corpus at
different levels of granularity. We extensively evaluate the meaningfulness of
the mined topics both qualitatively and quantitatively on the NIPS (1.7 K
documents), 20 Newsgroups (20 K), Reuters (800 K) and Wikipedia (4 M) corpora.
Additionally, we compare the quality of SWMH with Online LDA topics for
document representation in classification.Comment: 10 pages, Proceedings of the Mexican Conference on Pattern
Recognition 201
VConv-DAE: Deep Volumetric Shape Learning Without Object Labels
With the advent of affordable depth sensors, 3D capture becomes more and more
ubiquitous and already has made its way into commercial products. Yet,
capturing the geometry or complete shapes of everyday objects using scanning
devices (e.g. Kinect) still comes with several challenges that result in noise
or even incomplete shapes. Recent success in deep learning has shown how to
learn complex shape distributions in a data-driven way from large scale 3D CAD
Model collections and to utilize them for 3D processing on volumetric
representations and thereby circumventing problems of topology and
tessellation. Prior work has shown encouraging results on problems ranging from
shape completion to recognition. We provide an analysis of such approaches and
discover that training as well as the resulting representation are strongly and
unnecessarily tied to the notion of object labels. Thus, we propose a full
convolutional volumetric auto encoder that learns volumetric representation
from noisy data by estimating the voxel occupancy grids. The proposed method
outperforms prior work on challenging tasks like denoising and shape
completion. We also show that the obtained deep embedding gives competitive
performance when used for classification and promising results for shape
interpolation
Analysis of dropout learning regarded as ensemble learning
Deep learning is the state-of-the-art in fields such as visual object
recognition and speech recognition. This learning uses a large number of
layers, huge number of units, and connections. Therefore, overfitting is a
serious problem. To avoid this problem, dropout learning is proposed. Dropout
learning neglects some inputs and hidden units in the learning process with a
probability, p, and then, the neglected inputs and hidden units are combined
with the learned network to express the final output. We find that the process
of combining the neglected hidden units with the learned network can be
regarded as ensemble learning, so we analyze dropout learning from this point
of view.Comment: 9 pages, 8 figures, submitted to Conferenc
Intestinal Parasites Classification Using Deep Belief Networks
Currently, approximately billion people are infected by intestinal
parasites worldwide. Diseases caused by such infections constitute a public
health problem in most tropical countries, leading to physical and mental
disorders, and even death to children and immunodeficient individuals. Although
subjected to high error rates, human visual inspection is still in charge of
the vast majority of clinical diagnoses. In the past years, some works
addressed intelligent computer-aided intestinal parasites classification, but
they usually suffer from misclassification due to similarities between
parasites and fecal impurities. In this paper, we introduce Deep Belief
Networks to the context of automatic intestinal parasites classification.
Experiments conducted over three datasets composed of eggs, larvae, and
protozoa provided promising results, even considering unbalanced classes and
also fecal impurities
A Layer-Wise Information Reinforcement Approach to Improve Learning in Deep Belief Networks
With the advent of deep learning, the number of works proposing new methods
or improving existent ones has grown exponentially in the last years. In this
scenario, "very deep" models were emerging, once they were expected to extract
more intrinsic and abstract features while supporting a better performance.
However, such models suffer from the gradient vanishing problem, i.e.,
backpropagation values become too close to zero in their shallower layers,
ultimately causing learning to stagnate. Such an issue was overcome in the
context of convolution neural networks by creating "shortcut connections"
between layers, in a so-called deep residual learning framework. Nonetheless, a
very popular deep learning technique called Deep Belief Network still suffers
from gradient vanishing when dealing with discriminative tasks. Therefore, this
paper proposes the Residual Deep Belief Network, which considers the
information reinforcement layer-by-layer to improve the feature extraction and
knowledge retaining, that support better discriminative performance.
Experiments conducted over three public datasets demonstrate its robustness
concerning the task of binary image classification
- …